Skip to content

[reland][ROCm] preshuffled weight mm #2044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 22, 2025
Merged

Conversation

jeffdaily
Copy link
Contributor

Adds SwizzleTensor subclass that wraps a Tensor and reorders the contents to be suitable for HIPBLASLT_ORDER_COL16_4R8. SwizzleTensor intercepts torch.mm and replaces with custom calls to hipblaslt.

Copy link

pytorch-bot bot commented Apr 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2044

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2025
@jeffdaily
Copy link
Contributor Author

@mxz297 @jerryzh168 please re-review, kick of CI, thanks.

@mxz297
Copy link

mxz297 commented Apr 13, 2025

@jeffdaily "test-mps-ops" still seems to be failing to compile with

  /Users/ec2-user/runner/_work/ao/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:1:10: fatal error: 'hip/hip_runtime.h' file not found
  #include <hip/hip_runtime.h>
           ^~~~~~~~~~~~~~~~~~~

I wonder if we should just guard the whole source file under #if USE_ROCM

@jeffdaily
Copy link
Contributor Author

@jeffdaily "test-mps-ops" still seems to be failing to compile with

  /Users/ec2-user/runner/_work/ao/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:1:10: fatal error: 'hip/hip_runtime.h' file not found
  #include <hip/hip_runtime.h>
           ^~~~~~~~~~~~~~~~~~~

I wonder if we should just guard the whole source file under #if USE_ROCM

Done.

@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297 mxz297 merged commit 2266451 into pytorch:main Apr 22, 2025
4 checks passed
lisjin pushed a commit to lisjin/ao that referenced this pull request Apr 22, 2025
* [ROCm][experimental] pre-shuffle weights

* add custom gemm op

* pass through swizzled

* copy paste bug causing extra matmul to execute

* correct transpose and permute logic

* swizzle.cpp is rocm-only, remove #ifndef USE_ROCM

* transpose is shallow, don't unswizzle/swizzle

* add fp8 swizzle

* remove print statement

* setup.py missing check for vec ext

* remove merge mistake

* conditionalize building sparse marlin for hip

* ruff format

* ruff check --fix

* protect swizzle.cpp inside USE_ROCM

* patch from @mxz297
@jerryzh168
Copy link
Contributor

is this not fixed?

__w/ao/ao/pytorch/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:4:10: fatal error: hip/hip_runtime.h: No such file or directory
    4 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

@jerryzh168
Copy link
Contributor

@mxz297
Copy link

mxz297 commented Apr 23, 2025

@jeffdaily @jerryzh168 This is very strange.... So, previously, this error

__w/ao/ao/pytorch/ao/torchao/csrc/rocm/swizzle/swizzle.cpp:4:10: fatal error: hip/hip_runtime.h: No such file or directory
    4 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

only shows up on non-AMD platform, so we added a commit that will guard the whole source file under #if USE_ROCM. And we indeed no longer saw this failure anymore. And i saw clean merge signals before doing the merge.

So, i am a little bit surprised by:
(1) why the test failure did not show up before the merge?
(2) The failure now is on AMD platforms. Jeff have tested this on OSS platform and I have tested this inside meta. I feels like more likely the test platform does not have proper rocm setup?

@jeffdaily Are you able to repro these rocm failures somehow?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Apr 23, 2025

the error appears in internal diff as well: https://www.internalfb.com/diff/D73052566 I think we should revert for now? does this error not appear in the original PR/diff?

@mxz297
Copy link

mxz297 commented Apr 23, 2025

@jerryzh168 replied in the internal diff, but it seems like some other failure, which feels like caused by some other diff, though

@petrex
Copy link
Collaborator

petrex commented Apr 24, 2025

I am seeing the same error in wheel build.
Maybe we are missing proper tool-chain/ rocm headers in the pytorch/manylinux2_28-builder:rocm6.2.4 ? or just some env var? @amdfaa

@HDCharles
Copy link
Contributor

looking at our code, we have:

#if defined(USE_ROCM)
#include <hip/hip_bf16.h>
#include <hip/hip_fp16.h>
#include <hip/hip_runtime.h>
#endif

in tensor_core_tiled_layout.cu

Is the hip include here not gated correctly?

@jeffdaily
Copy link
Contributor Author

That looks gated correctly. The CI build is missing -I/opt/rocm for some reason. The header files are there, but flag is missing.


__all__ = [
"dtypes",
"autoquant",
"optim",
"quantize_",
"swizzle",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this added to top level? should this be in prototype for now?

@@ -0,0 +1,9 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to create a new folder under torchao for this tensor/op I think..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you recommend for it to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this prototype? we can add to torchao/prototype for now

atalman added a commit to atalman/ao that referenced this pull request May 5, 2025
atalman added a commit that referenced this pull request May 6, 2025
…2170)

* Revert "[reland][ROCm] preshuffled weight mm (#2044)"

This reverts commit 2266451.

* Revert "Re-land "Add INT8 SDPA path for CPU" (#2093)"

This reverts commit 137b079.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants